Using the geometric distribution, we can determine how likely it is that we find at least 1 life-inhabiting planet within a certain number of trials. For those who may be familiar with the exponential distribution (for continuous random variables), the geometric distribution is the version for discrete random variables.
To avoid potential confusion, the geometric distribution that will be used here is sometimes referred to as the shifted geometric distribution. It is defined as the probability distribution of the number of Bernoulli trials needed to get one success. A Bernoulli trial is one that can only result in a success or a failure. It takes on a single parameter, p, the probability of success (constant for each trial).
In essence, once we have a value for p (the probability of success on each trial) we can either fix the number of trials and solve for the probability that at least 1 success occurs, or we can fix the probability that at least 1 success occurs and solve for the number of trials required to achieve said probability.
The probability of success on the n-th trial is:
(1 - p)n-1 • p
The result yields the probability of failing on the first n-1 trials and observing the first success on the n-th trial.
This, however, does not quite answer our question. This formula only finds the probability of the first success occurring on exactly the n-th trial. We want the probability of the first success occurring on or before the n-th trial. We can achieve this by taking the sum of the PMF probabilities from 1 to n.
Probability of first success by (on or before) the n-th trial is:
1 - (1 - p)n
Now we can plug in values for p and n and get the probability of the first success on or before the n-th trial.
To answer our second question, we can rearrange the CDF function to solve for n.
n = log(1 - x) / log(1 - p)
Note: n will often end up as a decimal, so we always round up since we cannot have a partial trial.
After plugging in values for x and p, we obtain a value for n. This is the number of trials needed in order to have a certain chance (x) of getting at least 1 success within n trials.
For example: Let's say your favorite sports team has a 30% chance of winning any individual game. This is p, the probability of success. We want to know how many games your favorite team would have to play in order to have a 90% chance of winning at least 1 game within that span. This 90% represents x, our pre-selected probability. We want to know n, which is how many games they have to play in order to have a 90% chance of winning at least once. When we plug in 0.9 for x and 0.3 for p in our equation for n, we end up with n = 6.46. Since your team cannot win a game that is not completed, n is rounded up to 7. Thus, 7 is the number of games your favorite team would have to play so that the chance of winning at least 1 game is 90%.
The geometric distribution has a memorylessness property. This property essentially states that previous failed trials do not affect the probability of success on future trials. You may have heard the terms "due" or "overdue" used in such a context. After many failed trials, a success is not more likely or "due to happen" because of "unluckiness" that has already occurred. For instance, if there is a 50% chance of getting at least 1 success over the next 10 trials and a success does not occur, the next 10 trials after that still have a 50% chance of getting at least 1 success. The memorylessness property is saying that being "due" for a success is not how probability works.
For example: Going back to your favorite sports team from earlier, if they end up losing the first 6 games that they play, they are not more likely to win the seventh game. The probability of success, p, remains constant at 30% for the next game. However, (after losing the first 6 games) over the next 7 games there is still a 90% chance of winning at least 1 of those games, with p equal to 0.3.
For our purposes, we can treat p as the product of the orange and red factors, which represents the probability of life on an individual planet. One trial involves checking a planet and verifying if it has life or not. Assuming that we have the capabilities to do so, a planet with life is classified as a success and a planet without life is classified as a failure.
Pfeifer's estimate for the red factor does not have a finite value. In order to carry out these calculations, p must be finite, so we will have to substitute finite values for this factor. If you are using your own estimates, you do not have to worry about this as long as all of your estimates are finite.
For now, let's say P(Life Forms) = 0.1
The product of this and the orange factors results in a probability of life on an individual planet equal to 0.00000035, which is roughly 1 in every 2.857 million planets. This is p, the probability of success for each planet.
If we were to check all 5678 exoplanets (as of July 2024) for life with p = 0.00000035, we get (using the CDF function):
1 - (1 - 0.00000035)5678 = 0.001985327
Now let's find how many planets we would need to check (n) if we choose a probability of finding at least 1 planet (out of n) with life (x). p is still equal to 0.00000035. If we choose a value for x of 50%, solving for n we get:
n = log(1 - 0.5) / log(1 - 0.00000035) = 1,980,421
A table has been provided below showing how many planets we would have to check to have certain probabilities of finding at least 1 planet with life for various values of P(Life Forms) using Pfeifer's factor estimates.
P(Life Forms) | 10% | 50% | 90% | 99% |
---|---|---|---|---|
1.00 | 30,103 | 198,042 | 657,881 | 1,315,761 |
0.75 | 40,138 | 264,056 | 877,175 | 1,754,349 |
0.50 | 60,206 | 396,084 | 1,315,762 | 2,631,524 |
0.25 | 120,412 | 792,168 | 2,631,525 | 5,263,050 |
0.10 | 301,030 | 1,980,421 | 6,578,814 | 13,157,627 |
0.05 | 602,061 | 3,960,841 | 13,157,628 | 26,315,256 |
0.01 | 3,010,301 | 19,804,205 | 65,788,145 | 131,576,289 |
0.001 | 30,103,005 | 198,042,052 | 657,881,454 | 1,315,762,908 |
That is a lot of planets. Keep in mind that Pfeifer's estimate for the red factor P(Life Forms) is many orders of magnitude smaller than the examples shown above. In turn, that would cause the results in the table above to be many magnitudes greater.
In real life we are not able to instantly check whether a planet has life or not. If we were to start sending probes to exoplanets to scan for life, it would take years. This is because of the great distances between us and other stars, which are many light years away. We can calculate how long in years it would take to check a certain number of planets, if we define a rate at which they are checked. There is no way to estimate this rate, since we have not done it at any rate. It is just guesswork.
y = n / r
The top left number in the table above, - 30,103 - represents the number of planets to check in order to have a 10% chance of finding at least 1 with life, given that P(Life Forms) = 100%. This value for the life forms factor is the most optimistic. If we checked 10 planets per year, it would take 3010.3 years to check 30,103 planets. If no life-inhabiting planets are found, there would still be a 10% chance of finding at least 1 over the next 30,103 planets that are checked.
Made by Nicholas Pfeifer